biplotEZ

User-friendly biplots in R



Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za



SASA 2024

What are biplots?

  • The biplot is a powerful and very useful data visualisation tool.

  • Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.

  • A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.

  • There are different types of biplots that are based on various multivariate data analysus techniques.

Flow of functions in biplotEZ

Main Function

biplot()

Type of Biplot

PCA()
CVA()
PCO()
CA()

Aesthetics

samples()
axes() newsamples()
newaxes()

Operations

prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()

Plotting

plot()

First step to create a biplot

biplot()

First step to create a biplot

biplot(data = iris,
       )

First step to create a biplot

biplot(data = iris,
       group.aes = iris[,5], 
       Title = "My first biplot"
       )
# Object of class biplot, based on 150 samples and 5 variables.
# 4 numeric variables.
# 1 categorical variable.
Argument Description
data a dataframe or matrix containing all variables the user wants to analyse.
classes a vector identifying class membership. Required for CVA biplots
group.aes Variable from the data to be used as a grouping variable.
center a logical value indicating whether data should be column centered, with default TRUE.
scaled a logical value indicating whether data should be standardised to unit column variances, with default FALSE.
Title Title of the biplot to be rendered.

Theory behind constructing a PCA biplot

Data: \({\bf{X}}\)

#          X1       X2        X3
# 1  5.418840 5.054240  8.711160
# 2  3.129920 1.783160  3.385920
# 3  6.128080 2.173200  8.173560
# 4  6.781120 4.753280  8.731640
# 5  7.346560 5.893200 11.303040
# 6  7.208200 3.744000 10.075760
# 7  7.039440 5.213640  8.608840
# 8  5.465720 4.492640  5.596520
# 9  7.723240 4.708120 11.357480
# 10 7.109560 4.987520  7.732840
# 11 8.135800 4.392000  9.264840
# 12 6.287480 5.908720  7.488240
# 13 4.648880 7.198280  8.573720
# 14 5.798600 6.120080  8.254840
# 15 8.084560 3.234840  8.966560
# 16 6.157773 5.743455  9.899045
# 17 3.556727 2.026318  3.847636
# 18 6.963727 2.469545  9.288136
# 19 7.705818 5.401455  9.922318
# 20 8.348364 6.696818 12.844364
# 21 8.191136 4.254545 11.449727
# 22 7.999364 5.924591  9.782773
# 23 6.211045 5.105273  6.359682
# 24 8.776409 5.350136 12.906227
# 25 8.079045 5.667636  8.787318

Theory behind constructing a PCA biplot

Geometrically the rows of \({\bf{X}}\) are given as coordinates of \(n\) samples in the \(p\)-dimensional space \(\mathbb{R}^p\).

The aim is to seek an \(r\)-dimensional plane that contains the points whose coordinates are given by the rows of \({\bf{\hat{X}}}_{[r]}\) which minimises a least squares criterion given by, \[\begin{equation} || {\bf{X}} - {\bf{\hat{X}}}_{[r]}||^2 = tr\{({\bf{X}} - {\bf{\hat{X}}}_{[r]})({\bf{X}} - {\bf{\hat{X}}}_{[r]})'\}. \end{equation}\]

The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)

Representing samples

A standard result when \(r=2\) from is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projections of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\). The projections are therefore,

\[\begin{equation} {\bf{X}} {\bf{V}}_2. \end{equation}\] These projections are also known as the first two principal components.

Representing variables

The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.

Calibrated biplot axes

We have constructed a biplot, but the variables represented by the vectors (arrows) have no calibration.

That meaning, there are no markers on the vectors representing the variables analogous to ordinary scatterplots.

To construct a biplot axis with relevant markers for a variable, a \((p-1)\)-dimensional hyperplane \(\mathscr{N}\) perpendicular to the Cartesian axis is required.

First plane

From the data, \(p = 3\) therefore, a two-dimensional hyperplane is constructed perpendicular to \(X_1\) through a specific value of \(X_1\), say \(\mu\).

The intersection of \(\mathscr{L}\) and \(\mathscr{N}\) is an \((r-1)\)-dimensional intersection space, which in this case will be indicated by a line. All the points on this intersection line in \(\mathscr{L}\) will predict the value for \(\mu\) for the \(X_1\)-axis.

Second plane

The plane \(\mathscr{N}\) is shifted orthogonally through another value on \(X_1\) and all the points on the intersection line of \(\mathscr{L}\) and \(\mathscr{N}\) will predict that value that the plane goes through.

Intersection lines

As the plane \(\mathscr{N}\) is shifted along the \(X_1\)-axis, a series of parallel intersection spaces is obtained.

Any line passing through the origin will pass through these intersection spaces and can be used as an axis fitted with markers according to the value associated with the particular intersection space.

To facilitate orthogonal projection onto the axis, similar to an ordinary scatterplot, the line orthogonal to these intersection spaces is chosen.

PCA function

PCA()
Argument Description
bp Object of class biplot.
dim.biplot Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects Which eigenvectors (principal components) to extract, with default 1:dim.biplot.
group.aes If not specified in biplot()
show.class.means TRUE or FALSE: Indicating whether group means should be plotted in the biplot, with default FALSE.
correlation.biplot TRUE or FALSE: Indicating whether distances or correlations between the variables are optimally approximated, with defautl FALSE.

Data

tibble(iris)
# # A tibble: 150 × 5
#    Sepal.Length Sepal.Width Petal.Length
#           <dbl>       <dbl>        <dbl>
#  1          5.1         3.5          1.4
#  2          4.9         3            1.4
#  3          4.7         3.2          1.3
#  4          4.6         3.1          1.5
#  5          5           3.6          1.4
#  6          5.4         3.9          1.7
#  7          4.6         3.4          1.4
#  8          5           3.4          1.5
#  9          4.4         2.9          1.4
# 10          4.9         3.1          1.5
# # ℹ 140 more rows
# # ℹ 2 more variables: Petal.Width <dbl>,
# #   Species <fct>

PCA biplot

biplot(data = iris, group.aes = iris[,5],
       Title = "My first biplot") |>

PCA biplot

biplot(data = iris, group.aes = iris[,5],
       Title = "My first biplot") |> 
  PCA() |> 
  plot()

Flow of functions in biplotEZ

Main Function

biplot()

Type of Biplot

PCA()
CVA()
PCO()
CA()

Aesthetics

samples()
axes() newsamples()
newaxes()

Operations

prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()

Plotting

plot()

Aesthetics: samples()

Change the colour, plotting character and character expansion of the samples.

  samples(col = c("orange","purple","gold"), pch = c(15,1,17), cex = 1.2, 
          opacity = 0.6) |> 

Aesthetics: samples()

Change the colour, plotting character and character expansion of the samples.

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(col = c("orange","purple","gold"), pch = c(15,1,17), cex = 1.2, 
          opacity = 0.6) |> 
  plot()

Aesthetics: samples()

Select certain groups, and add labels to the samples

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(which = c(1,2), col = c("orange","purple"), label = TRUE) |> 
  plot()

Aesthetics: samples()

Other arguments

Argument Description
label.col Colour of labels
label.cex Text expansion of the labels
label.side Side at which the label of the plotted point appears - “bottom” (default), “top”, “left”, “right”
label.offset Offset of the label from the plotted point
connected TRUE or FALSE: whether samples are connected, with default FALSE
connect.col Colour of the connecting line
connect.lty Line type of the connecting line
connect.lwd Line width of the connecting line

Aesthetics: axes()

Change the colour and line width of the axes

biplot(iris[,1:4]) |> PCA() |> 
  samples(col = "grey", opacity = 0.5) |>

Aesthetics: axes()

Change the colour and line width of the axes

biplot(iris[,1:4]) |> PCA() |> 
  samples(col = "grey", opacity = 0.5) |>
  axes(col = "rosybrown", label.dir = "Orthog", lwd = 2) |> 
  plot()

Aesthetics: axes()

Show the first two axes with vector representation and unit circle

biplot(iris[,1:4]) |> PCA() |> 
  samples(col = "grey", opacity = 0.5) |>
  axes(which = 1:2, col = "rosybrown", vectors = TRUE, unit.circle = TRUE) |> 
  plot()

Aesthetics: axes()

Other arguments

Axis labels
ax.names
label.dir
label.col
label.cex
label.line
label.offset

Ticks
ticks
tick.size
tick.label
tick.label.side
tick.label.col
Prediction
predict.col
predict.lwd
predict.lty

Orthogonal
orthogx
orthogy

Flow of functions in biplotEZ

Main Function

biplot()

Type of Biplot

PCA()
CVA()
PCO()
CA()

Aesthetics

samples()
axes() newsamples()
newaxes()

Operations

prediction()
interpolate()
translate()
density()
fit.measures()
classify() alpha.bags()
ellipses()
rotate()
reflect()
zoom()
regress()
splines()

Plotting

plot()

Prediction of samples

prediction()

out <- biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> 
  samples(col = c("orange","purple","gold"), opacity = 0.5) |>

Prediction of samples

prediction()

out <- biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> 
  samples(col = c("orange","purple","gold"), opacity = 0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102) )|>

Prediction of samples

prediction()

out <- biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> 
  samples(col = c("orange","purple","gold"), opacity = 0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102) )|>
  axes(predict.col = "red", predict.lwd = 1.5, predict.lty = 2) |> plot()

Prediction of samples

Predict only on the variable Sepal.Length: use the which argument.

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> 
  samples(col = c("orange","purple","gold"), opacity = 0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102), which = "Sepal.Length")|>
  axes(predict.col = "red", predict.lwd = 1.5, predict.lty = 2) |> plot()

Prediction of group means

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA(show.class.means = TRUE) |> 
  samples(col = c("orange","purple","gold"), opacity = 0.5) |>
  prediction(predict.means = TRUE) |>
  axes(predict.col = "red", predict.lwd = 1.5, predict.lty = 2) |> plot()

Predictions

summary(out)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Sample predictions
#     Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1       5.083039    3.517414     1.403214   0.2135317
# 2       4.746262    3.157500     1.463562   0.2402459
# 51      6.757521    3.449014     4.739884   1.6079559
# 52      6.389336    3.210952     4.501645   1.5094058
# 101     6.751606    2.836199     5.928106   2.1069758
# 102     5.977297    2.517932     5.070066   1.7497923

Interpolation of samples

biplot(iris[1:100,]) |> PCA() |> 

Interpolation of samples

biplot(iris[1:100,]) |> PCA() |> 
  interpolate (newdata = iris[101:150,]) |> 

Interpolation of samples

biplot(iris[1:100,]) |> PCA() |> 
  interpolate (newdata = iris[101:150,]) |> 
  newsamples(col = "red") |> plot()

Interpolation of axes

biplot(iris[,1:3]) |> PCA() |> 
    interpolate(newdata = NULL, newvariable = iris[,4]) |> 
    newaxes(X.new.names = "Petal.Width") |> plot()

Translation

Automatically or manually translate the axes away from the center of the plot.

biplot(iris)|> 
      PCA(group.aes = iris[,5]) |> 
      translate_axes(swop=TRUE, delta = 0.2)|> plot(exp.factor = 3)

Density plots

On the first group

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 

Density plots

On the first group

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which = 1, col = c("white","purple","cyan","blue")) |> plot()

Density plots

On the second group, and adding contours

biplot(iris[,1:4], group.aes = iris[,5]) |> PCA() |> 
  density2D(which = 2, col = c("white","purple","cyan","blue"),
            contours = TRUE) |> plot()

Density plots

On the third group, and changing the colour of the contours.

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which = 3, col = c("white","purple","cyan","blue"),contours = TRUE,
            contour.col = "grey") |> plot()

Fit measures

out2 <- biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> fit.measures()
summary(out2)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Quality of fit in 2 dimension(s) = 97.8% 
# Adequacy of variables in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.5617091    0.5402798    0.7639426    0.1340685 
# Axis predictivity in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.9579017    0.8400028    0.9980931    0.9365937 
# Sample predictivity in 2 dimension(s):
#         1         2         3         4         5         6         7         8 
# 0.9998927 0.9927400 0.9999141 0.9991226 0.9984312 0.9949770 0.9914313 0.9996346 
#         9        10        11        12        13        14        15        16 
# 0.9998677 0.9941340 0.9991205 0.9949153 0.9945491 0.9996034 0.9942676 0.9897890 
#        17        18        19        20        21        22        23        24 
# 0.9937752 0.9990534 0.9972926 0.9928624 0.9896250 0.9932656 0.9918132 0.9955885 
#        25        26        27        28        29        30        31        32 
# 0.9812917 0.9897303 0.9979903 0.9990514 0.9963870 0.9975607 0.9985741 0.9876345 
#        33        34        35        36        37        38        39        40 
# 0.9833383 0.9957412 0.9970200 0.9935405 0.9859750 0.9953399 0.9994047 0.9990244 
#        41        42        43        44        45        46        47        48 
# 0.9980903 0.9756895 0.9953372 0.9830035 0.9763861 0.9959863 0.9905695 0.9987006 
#        49        50        51        52        53        54        55        56 
# 0.9996383 0.9987482 0.9275369 0.9996655 0.9544488 0.9460515 0.9172857 0.9061058 
#        57        58        59        60        61        62        63        64 
# 0.9727694 0.9996996 0.8677939 0.8686502 0.9613130 0.9328852 0.4345132 0.9679973 
#        65        66        67        68        69        70        71        72 
# 0.7995848 0.9083037 0.7968614 0.5835260 0.7900027 0.8575646 0.8524748 0.6615410 
#        73        74        75        76        77        78        79        80 
# 0.9367709 0.8661203 0.8350955 0.8929908 0.8702600 0.9873164 0.9969031 0.6815512 
#        81        82        83        84        85        86        87        88 
# 0.8937189 0.8409681 0.7829405 0.9848354 0.6901625 0.8073582 0.9666041 0.6665514 
#        89        90        91        92        93        94        95        96 
# 0.6993846 0.9909923 0.9008345 0.9710941 0.8037223 0.9913632 0.9744493 0.7089660 
#        97        98        99       100       101       102       103       104 
# 0.9071738 0.9064541 0.9625371 0.9872279 0.9171603 0.9636413 0.9976224 0.9829885 
#       105       106       107       108       109       110       111       112 
# 0.9854704 0.9888092 0.8464463 0.9729353 0.9771293 0.9794313 0.9746239 0.9977302 
#       113       114       115       116       117       118       119       120 
# 0.9941859 0.9605563 0.8476794 0.9289985 0.9929982 0.9916850 0.9818957 0.9493751 
#       121       122       123       124       125       126       127       128 
# 0.9865358 0.8716778 0.9728177 0.9846364 0.9840890 0.9861783 0.9854516 0.9691512 
#       129       130       131       132       133       134       135       136 
# 0.9942007 0.9585884 0.9705389 0.9937852 0.9874192 0.9723192 0.9230503 0.9794405 
#       137       138       139       140       141       142       143       144 
# 0.8947527 0.9797055 0.9458421 0.9902488 0.9674660 0.9350646 0.9636413 0.9867931 
#       145       146       147       148       149       150 
# 0.9500265 0.9470544 0.9688318 0.9886543 0.8735433 0.9281727